home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Technotools
/
Technotools (Chestnut CD-ROM)(1993).ISO
/
misc_pto
/
b-trees
/
btrees.txt
Wrap
Text File
|
1990-01-08
|
9KB
|
186 lines
B-Trees
SCSC 420 - File Management
University of South Carolina
J. E. Clary
The concept of B-Trees grew from a competition by
leading computer manufacturers and independent research
groups in the late 1960's. This competition was to
develop general methods for storing and retrieving data
in large file systems that would provide rapid access to
the data with a minimal overhead cost. Among the
competitors were R. Bayer and E. McCreight, who were
employees of the Boeing Corporation at the time. In
1972, Bayer and McCreight published an article entitled
"Organization and Maintenance of Large Ordered Indexes"
which first exposed the world to the conception of B-
Tree organization.
The need for a new method to access data arose from
the long seek times associated with ordinary access
methods and the need to have a sorted index in order to
perform binary searches. The overhead of maintaining a
sorted index was deemed unacceptable so a new method was
needed.
One explored solution was to use binary tree. This
method would alleviate the need to resort the index with
each new entry or deletion, but it too had some inherent
problems. Because the binary tree is located on
secondary storage, the number of seeks necessary to
satisfy a search can be very high. A successful strategy
for this is to reduce the seeks by paging groups of
nodes in and out of memory.
Another difficulty with binary trees is that they
are constructed starting with the root and working down
through the nodes. After construction, it is possible
that the tree root could contain the wrong keys, making
it necessary to find ways to repair the problem with
rearrangement algorithms. Bayer and McCreight recognized
that the decision to construct the tree from the root
down, was in itself the problem. Rather than find a way
to undo the problems inherent to binary trees, they
concentrated on algorithms to construct the tree in a
useful form to start with. With B-Trees the root node is
allowed to emerge, rather than setting it up, and then
perhaps having to change it.
Each node of a B-Tree consists of an ordered
sequence of entries and some pointers. The number of
pointers is always equal to the number of entries + 1.
The maximum number of entries that can be stored in a
node is called the order of the B-Tree. When a node
becomes full, it is split into two nodes and the entries
evenly distributed between the old node and the new one.
Since we now have two nodes, there is a need for a
higher level in the tree to enable us to choose between
the two nodes when searching. This new node, in essence
will become our new root node. When a new root node is
created, it needs to contain an entry also. The method
to accomplish this is to move the entry that separates
the two lower level nodes into the root. This is called
promoting the entry.
By using these methods to construct a tree, we are
assured that the tree is always in balance, with regard
to height. The path length from the root to any leaf is
the same as the path length from the root to any other
leaf. Another positive aspect of this type construction
is that the entries that are promoted, are the entries
that we want to be in the root node. By working up from
the leaf level, splitting and promoting as nodes fill
up, we eliminate the problems encountered with ordinary
binary trees.
Because of the properties of a B-Tree, many of the
operations on the tree can be done recursively. The
simplest of these operations is searching. The search
through a B-Tree is divided into two distinctive parts.
First the node where the entry should reside is located.
Then the node itself is searched for the particular
entry.
The next operation to be considered is insertion of
new data into the B-Tree. First, the Insertion algorithm
must locate the node in which the new data item is to be
placed. Second, the algorithm searches for the
possibility that the data to be inserted is already
contained in the tree. If the data is not already in the
tree, the correct insertion point is located. At this
point the insertion is made. There is always the
possibility that the node where the data is to be
inserted may be full. If this is the case, the insertion
algorithm must be able to control the promotion and
splitting of the node. Depending on the contents of the
B-Tree, this promotion and splitting may filter up
through the higher nodes and may even cause a new root
node to be created.
The process of splitting and promoting used in the
insertion algorithm ensures that the rules of a B-Tree
be maintained. This is also the case for the deletion
algorithm. The removal of an entry must occur at the
leaf level. When the desired entry is located in a leaf,
simply delete that entry. If the entry is not in a leaf,
we swap the entry with its immediate successor and
delete it then.
At times, the deletion of an entry will cause the B-
Tree to experience underflow. Underflow is the condition
where the number of entries in a leaf is less than the
minimum number of entries allowed by that order B-Tree.
When underflow occurs, the first thing to do is to look
at the left sibling. If it has more than the minimum
number of entries allowed, redistribute the both nodes
entries so that they each contain at least the minimum
entries allowed. If neither sibling has more than the
minimum, concatenate the two leaves and along with their
median entry from the parent node into one leaf. When
concatenation occurs, you must check the parent to make
sure it does not underflow.
Another condition that can occur during deletion is
when the root experiences undeflow. When this happens
the remaining entry in the root is absorbed into a new
root. This in turn causes the height of the B-Tree to be
decreased.
These are just the essentials of B-Tree
implementation. Over the years since Bayer and
McCreight, the concept of the B-Tree has evolved into a
more efficient entities. One derivative, B*-Trees uses
enhanced splitting and redistribution processes so as to
increase the storage utilization. Others go a step
further and create Virtual B-Trees. Reguardless of the
implementation, the B-Tree as proposed by Bayer and
McCreight has become the de facto standard for indexes
into a data base system.
Sources:
File and Database Techniques,
James Bradley,
University of Calgary
File Design & Programming,
W. Wesley Peterson and Art Lew,
University of Hawaii at Manoa
File Management Techniques,
Billy G. Claybrook,
The MITRE Corporation
File Structures,
Michael J. Folik and Bill Zoellick,
Oklahoma State University
Pascal plus Data Structures,
Nell Dale,
University of Texas as Austin
Understanding Data Base Management Systems,
Joseph A. Vasta,
Bethlehem Steel Corporation